privacy budget
Private Adaptive Covariance Estimation via Gaussian Graphical Models
Ferrando, Cecilia, Fuentes, Miguel, Mullins, Brett, Musco, Cameron, Sheldon, Daniel
We propose PACE-GGM, a data-adaptive differentially private method for covariance estimation that concentrates its privacy budget on the most informative entries of the empirical covariance matrix, rather than perturbing all entries. This applies in the natural setting where the modeler supplies separate bounds for each variable, so that individual entries can be measured with less noise than the full matrix. In each round, our method selects a poorly approximated entry, measures it using the Gaussian mechanism, and then reconstructs a full covariance matrix using a maximum-entropy reconstruction objective, leading to a Gaussian graphical model structure. Experiments on diverse real-world datasets demonstrate consistent improvements in estimation error with respect to the Gaussian mechanism and other baselines, particularly in high-dimensional and low-to-moderate privacy regimes.
In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning
When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a (possibly distributed) collection of teacher models via a voting mechanism. The mechanism adds noise to attain a differential privacy guarantee with respect to the teachers' training data. In this work, we observe that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information. For a given input, our adversary exploits this stochasticity to extract high-fidelity histograms of the votes submitted by the underlying teachers. From these histograms, the adversary can learn sensitive attributes of the input such as race, gender, or age. Although this attack does not directly violate the differential privacy guarantee, it clearly violates privacy norms and expectations, and would not be possible at all without the noise inserted to obtain differential privacy. In fact, counter-intuitively, the attack becomes easier as we add more noise to provide stronger differential privacy. We hope this encourages future work to consider privacy holistically rather than treat differential privacy as a panacea.
Differentially Private Linear Sketches: Efficient Implementations and Applications
Linear sketches have been widely adopted to process fast data streams, and they can be used to accurately answer frequency estimation, approximate top K items, and summarize data distributions. When data are sensitive, it is desirable to provide privacy guarantees for linear sketches to preserve private information while delivering useful results with theoretical bounds. We show that linear sketches can ensure privacy and maintain their unique properties with a small amount of noise added at initialization. From the differentially private linear sketches, we showcase that the state-of-the-art quantile sketch in the turnstile model can also be private and maintain high performance. Experiments further demonstrate that our proposed differentially private sketches are quantitatively and qualitatively similar to noise-free sketches with high utilization on synthetic and real datasets.
GlucoSynth: Generating Differentially-Private Synthetic Glucose Traces Anonymous Author(s) Affiliation Address email
We focus on the problem of generating high-quality, private synthetic glucose1 traces, a task generalizable to many other time series sources. Existing methods for2 time series data synthesis, such as those using Generative Adversarial Networks3 (GANs), are not able to capture the innate characteristics of glucose data and cannot4 provide any formal privacy guarantees without severely degrading the utility of the5 synthetic data. In this paper we present GlucoSynth, a novel privacy-preserving6 GAN framework to generate synthetic glucose traces. The core intuition behind our7 approach is to conserve relationships amongst motifs (glucose events) within the8 traces, in addition to temporal dynamics. Our framework incorporates differential9 privacy mechanisms to provide strong formal privacy guarantees.
Equivalence Testing Under Privacy Constraints
Pareek, Savita, Insolia, Luca, Molinari, Roberto, Guerrier, Stéphane
Protecting individual privacy is essential across research domains, from socio-economic surveys to big-tech user data. This need is particularly acute in healthcare, where analyses often involve sensitive patient information. A typical example is comparing treatment efficacy across hospitals or ensuring consistency in diagnostic laboratory calibrations, both requiring privacy-preserving statistical procedures. However, standard equivalence testing procedures for differences in proportions or means, commonly used to assess average equivalence, can inadvertently disclose sensitive information. To address this problem, we develop differentially private equivalence testing procedures that rely on simulation-based calibration, as the finite-sample distribution is analytically intractable. Our approach introduces a unified framework, termed DP-TOST, for conducting differentially private equivalence testing of both means and proportions. Through numerical simulations and real-world applications, we demonstrate that the proposed method maintains type-I error control at the nominal level and achieves power comparable to its non-private counterpart as the privacy budget and/or sample size increases, while ensuring strong privacy guarantees. These findings establish a reliable and practical framework for privacy-preserving equivalence testing in high-stakes fields such as healthcare, among others.
On the Optimal Number of Grids for Differentially Private Non-Interactive $K$-Means Clustering
Muthukrishnan, Gokularam, Tandon, Anshoo
Differentially private $K$-means clustering enables releasing cluster centers derived from a dataset while protecting the privacy of the individuals. Non-interactive clustering techniques based on privatized histograms are attractive because the released data synopsis can be reused for other downstream tasks without additional privacy loss. The choice of the number of grids for discretizing the data points is crucial, as it directly controls the quantization bias and the amount of noise injected to preserve privacy. The widely adopted strategy selects a grid size that is independent of the number of clusters and also relies on empirical tuning. In this work, we revisit this choice and propose a refined grid-size selection rule derived by minimizing an upper bound on the expected deviation in the K-means objective function, leading to a more principled discretization strategy for non-interactive private clustering. Compared to prior work, our grid resolution differs both in its dependence on the number of clusters and in the scaling with dataset size and privacy budget. Extensive numerical results elucidate that the proposed strategy results in accurate clustering compared to the state-of-the-art techniques, even under tight privacy budgets.